Goto

Collaborating Authors

 pac-bayesian analysis


Subgroup Generalization and Fairness of Graph Neural Networks

Neural Information Processing Systems

Despite enormous successful applications of graph neural networks (GNNs), theoretical understanding of their generalization ability, especially for node-level tasks where data are not independent and identically-distributed (IID), has been sparse. The theoretical investigation of the generalization performance is beneficial for understanding fundamental issues (such as fairness) of GNN models and designing better learning methods. In this paper, we present a novel PAC-Bayesian analysis for GNNs under a non-IID semi-supervised learning setup. Moreover, we analyze the generalization performances on different subgroups of unlabeled nodes, which allows us to further study an accuracy-(dis)parity-style (un)fairness of GNNs from a theoretical perspective. Under reasonable assumptions, we demonstrate that the distance between a test subgroup and the training set can be a key factor affecting the GNN performance on that subgroup, which calls special attention to the training node selection for fair learning. Experiments across multiple GNN models and datasets support our theoretical results4.


A PAC-Bayesian Analysis of Randomized Learning with Application to Stochastic Gradient Descent

Neural Information Processing Systems

We study the generalization error of randomized learning algorithms -- focusing on stochastic gradient descent (SGD) -- using a novel combination of PAC-Bayes and algorithmic stability. Importantly, our generalization bounds hold for all posterior distributions on an algorithm's random hyperparameters, including distributions that depend on the training data.


Reviews: A PAC-Bayesian Analysis of Randomized Learning with Application to Stochastic Gradient Descent

Neural Information Processing Systems

The paper opens the way to a new use of PAC-Bayesian theory, by combining PAC-Bayes with algorithmic stability to study stochastic optimization algorithms. The obtained probabilistic bounds are then used to inspire adaptive sampling strategies, studied empirically in a deep learning scenario. The paper is well written, and the proofs are non-trivial. It contains several clever ideas, namely the use of algorithmic stability to bound the complexity term inside PAC-Bayesian bounds. It's also fruitful to express the prior and posterior distributions over the sequences of indexes used by a stochastic gradient descent algorithm.



PAC-Bayes-Empirical-Bernstein Inequality

Neural Information Processing Systems

We present a PAC-Bayes-Empirical-Bernstein inequality. The inequality is based on a combination of the PAC-Bayesian bounding technique with an Empirical Bernstein bound. We show that when the empirical variance is significantly smaller than the empirical loss the PAC-Bayes-Empirical-Bernstein inequality is significantly tighter than the PAC-Bayes-kl inequality of Seeger (2002) and otherwise it is comparable. Our theoretical analysis is confirmed empirically on a synthetic example and several UCI datasets. The PAC-Bayes-Empirical-Bernstein inequality is an interesting example of an application of the PAC-Bayesian bounding technique to self-bounding functions.


PAC-Bayesian Analysis of Contextual Bandits

Neural Information Processing Systems

We derive an instantaneous (per-round) data-dependent regret bound for stochastic multiarmed bandits with side information (also known as contextual bandits). The scaling of our regret bound with the number of states (contexts) N goes as \sqrt{N I_{\rho_t}(S;A)}, where I_{\rho_t}(S;A) is the mutual information between states and actions (the side information) used by the algorithm at round t . If the algorithm uses all the side information, the regret bound scales as \sqrt{N \ln K}, where K is the number of actions (arms). However, if the side information I_{\rho_t}(S;A) is not fully used, the regret bound is significantly tighter. In the extreme case, when I_{\rho_t}(S;A) 0, the dependence on the number of states reduces from linear to logarithmic.


A PAC-Bayesian Analysis of Distance-Based Classifiers: Why Nearest-Neighbour works!

arXiv.org Machine Learning

Abstract We present PAC-Bayesian bounds for the generalisation error of the K-nearest-neighbour classifier (K-NN). This is achieved by casting the K-NN classifier into a kernel space framework in the limit of vanishing kernel bandwidth. We establish a relation between prior measures over the coefficients in the kernel expansion and the induced measure on the weight vectors in kernel space. Defining a sparse prior over the coefficients allows the application of a PAC-Bayesian folk theorem that leads to a generalisation bound that is a function of the number of redundant training examples: those that can be left out without changing the solution. The presented bound requires to quantify a prior belief in the sparseness of the solution and is evaluated after learning when the actual redundancy level is known. Even for small sample size (m ~ 100) the bound gives non-trivial results when both the expected sparseness and the actual redundancy are high.


PAC-Bayesian Analysis of Contextual Bandits

Neural Information Processing Systems

We derive an instantaneous (per-round) data-dependent regret bound for stochastic multiarmed bandits with side information (also known as contextual bandits). The scaling of our regret bound with the number of states (contexts) $N$ goes as $\sqrt{N I_{\rho_t}(S;A)}$, where $I_{\rho_t}(S;A)$ is the mutual information between states and actions (the side information) used by the algorithm at round $t$. If the algorithm uses all the side information, the regret bound scales as $\sqrt{N \ln K}$, where $K$ is the number of actions (arms). However, if the side information $I_{\rho_t}(S;A)$ is not fully used, the regret bound is significantly tighter. In the extreme case, when $I_{\rho_t}(S;A) 0$, the dependence on the number of states reduces from linear to logarithmic. Our analysis allows to provide the algorithm large amount of side information, let the algorithm to decide which side information is relevant for the task, and penalize the algorithm only for the side information that it is using de facto.


A PAC-Bayesian Analysis of Randomized Learning with Application to Stochastic Gradient Descent

Neural Information Processing Systems

We study the generalization error of randomized learning algorithms -- focusing on stochastic gradient descent (SGD) -- using a novel combination of PAC-Bayes and algorithmic stability. Importantly, our generalization bounds hold for all posterior distributions on an algorithm's random hyperparameters, including distributions that depend on the training data. We analyze this algorithm in the context of our generalization bounds and evaluate it on a benchmark dataset. Our experiments demonstrate that adaptive sampling can reduce empirical risk faster than uniform sampling while also improving out-of-sample accuracy. Papers published at the Neural Information Processing Systems Conference.


PAC-Bayesian Transportation Bound

arXiv.org Machine Learning

We present a new generalization error bound, the \emph{PAC-Bayesian transportation bound}, unifying the PAC-Bayesian analysis and the generic chaining method in view of the optimal transportation. The proposed bound is the first PAC-Bayesian framework that characterizes the cost of de-randomization of stochastic predictors facing any Lipschitz loss functions. As an example, we give an upper bound on the de-randomization cost of spectrally normalized neural networks~(NNs) to evaluate how much randomness contributes to the generalization of NNs.